Standard Evaluation Versus Non-Standard Evaluation in R

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There is a lot of unnecessary worry over “Non Standard Evaluation” (NSE) in R versus “Standard Evaluation” (SE, or standard “variables names refer to values” evaluation). This very author is guilty of over-discussing the issue. But let’s give this yet another try.

The entire difference between NSE and regular evaluation can be summed up in the following simple table (which should be clear after we work some examples).

Tbl

Standard Evaluation

In standard (or value oriented evaluation) code you type in is taken to be variable names, functions, names, operators, and even numeric literal values. String values or literals need extra marks, such as quotes.

Let’s look at this very slowly. First we set up a couple of variables.

d <- data.frame(x = 1, y = 2)

x <- "y"

Now if we want to extract the column named by x (which happens to be "y") we write code such as the following.

d[, x, drop=FALSE]
#   y
# 1 2

R treats by default the symbol x as a variable name and uses the value associated with it (in this case "y") to select a data column by name.

If we want to name the "x" directly (or treat "x" as a literal value) we add a small bit of extra notation: quote marks. This looks like the following.

d[, "x", drop=FALSE]
#   x
# 1 1

Notice this time we extracted the column named "x".

To summarize: in standard evaluation user written (or source code) names are treated as references to values, and to form string literals one adds quotes.

Non Standard Evaluation

In NSE (non-standard evaluation) the roles are reversed. Names are treated as literal string values, unless extra marks are added.

Non-standard techniques have been built into R quite some time. For example:

bquote(x)
# x

bquote(.(x))
# [1] "y"

Notice how in the first example bquote() captured the name “x“, which would allow code to use this name as a value. In the second example x was substituted for its associated value: the character string "y".

dplyr instead of using the existing R/bquote() methodology (re-)invented its own notation “!!“. For example x written alone is the string "x".

suppressPackageStartupMessages(library("dplyr"))

select(d, x)
#   x
# 1 1

Notice this extracted the column named "x", and not the column named "y" ("y" being the value associated with the variable x).

In contrast, if we want to use the value associate with x in NSE we add the extra de-quoting symbols !!, as we show here.

select(d, !!x)
#   y
# 1 2

Notice this extracted the column named "y" ("y" being the value associated with the variable x).

Though in some cases in the rlang/dplyr argot, one must sometimes first prepare a variable or value for de-quoting using one of sym(), ensym(), vars(), quo(), enquo(), quo_name(), quos(), enquos(), expr(), or enexpr(). Which to use depends on some combination of what you are trying to quote, where the value comes from (user code, function argument), what type of execution situation you are in (in function body, or not), execution rules (dplyr/tidyselect select rules versus non-select rules), and what version of rlang you are using.

There is nothing magic about the “!!” notation. rlang could just as easily chosen many others (one rumor is that a {{}} notation is coming, to push rlang closer to glue‘s {} string-interpolation notation). One possible notation would have been the .() notation already used in R, bquote(), and data.table (though data.table uses .() for something closer to list construction/quoting instead of marking for evaluation).

For instance a .()-enabled version of dplyr can be easily simulated as follows.

select <- wrapr::bquote_function(dplyr::select)

select(d, x)
#   x
# 1 1

select(d, .(x))
#   y
# 1 2

Note, the above (while it works) is just for demonstration. dplyr is designed to use !! not .().

To summarize: in non-standard evaluation user written (or source code) names are treated as string literals, and to reference variable values one adds an escaping notation.

Conclusion

NSE defaults to quoting names to make them string literals, whereas SE defaults to expecting unquoted variable names and requiring quotes around string literals.

With the above examples under your belt, you should be able to see the simple point of the initial table: NSE and SE are just complimentary notations. Neither notation is more powerful than the other, and R has had good tools to use both notations well before rlang was started.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)